Back

Journal of Medical Imaging

SPIE-Intl Soc Optical Eng

Preprints posted in the last 90 days, ranked by how well they match Journal of Medical Imaging's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
UCSF RMaC: University of California San Francisco 3D Multi-Phase Renal Mass CT Dataset with Tumor Segmentations

Sahin, S.; Diaz, E.; Rajagopal, A.; Abtahi, M.; Jones, S.; Dai, Q.; Kramer, S.; Wang, Z.; Larson, P. E. Z.

2026-02-12 radiology and imaging 10.64898/2026.02.11.26346096 medRxiv
Top 0.1%
6.3%
Show abstract

Current standard of care imaging practices cannot reliably differentiate among certain renal tumors such as benign oncocytoma and clear cell renal cell carcinoma (RCC), and between low and high grade RCCs. Previous work has explored using deep learning, radiomics, and texture analysis to predict renal tumor subtypes and differentiate between low and high grade RCCs with mixed success. To further this work, large diverse datasets are needed to improve model performance and provide strong evaluation sets. In this work, a dataset of 831 multi-phase 3D CT exams was curated. Each exam contains up to three contrast-enhanced CT phases. Tumor outlines or bounding boxes were annotated and registered to the image volumes. The pathology results for each tumor and relevant patient metadata are also included.

2
Analysis Of Augmentation Techniques for Spine X-Ray Images

Sivakumar, E.; Anand, A.

2026-04-17 radiology and imaging 10.64898/2026.04.15.26350121 medRxiv
Top 0.1%
6.3%
Show abstract

Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves ~99% validation accuracy with both classifiers across all three case studies. Keywords: Data augmentation, Generative Adversarial Network, VGG-16, InceptionNet, Class imbalance, Computer vision, Spine X-ray, Radiology.

3
Performance of Naiive Spectral Geometric Models in Histopathology AI

Leyva, A.; Niazi, M. K. K.

2026-02-02 pathology 10.64898/2026.01.30.702908 medRxiv
Top 0.1%
4.9%
Show abstract

There have been no systematic evaluations of purely spectral models for digital pathology tasks. We implemented and benchmarked four pipelines: binary classification on the BreaKHis dataset, multi-class region classification in glioblastoma, spatial transcriptomics, and denoising on Visium 10x. Across all tasks, extensive cross-validation and grouped splits showed that purely spectral models did not improve performance over CNN-only baselines, but offer useful complementary tools for interpretability and processing. Denoising showed strong performance that proves utility in data-scarce or heterogeneous image environments. Equivalence testing confirms that spectral and CNN model performances fall outside {+/-}3% AUC. Fusion models between CNNs and spectral models show higher balanced accuracy. Spectral models failed to generalize across spatial transcriptomics tasks, with low correlation despite stable training loss. These findings represent a systematic negative result: despite their theoretical richness, spectral geometric features and SNO embeddings prove to be complementary features for WSI classification or segmentation. Reporting such outcomes is essential to establish empirical boundaries for spectral methods and to encourage future work on conditions or data modalities where these approaches may hold greater promise.

4
Quality versus quantity of training datasets for artificial intelligence-based whole liver segmentation

Castelo, A.; O'Connor, C.; Gupta, A. C.; Anderson, B. M.; Woodland, M.; Altaie, M.; Koay, E. J.; Odisio, B. C.; Tang, T. T.; Brock, K. K.

2026-02-18 radiology and imaging 10.64898/2026.02.17.26346486 medRxiv
Top 0.1%
4.3%
Show abstract

Artificial intelligence (AI) based segmentation has many medical applications but limited curated datasets challenge model training; this study compares the impact of dataset annotation quality and quantity on whole liver AI segmentation performance. We obtained 3,089 abdominal computed tomography scans with whole-liver contours from MD Anderson Cancer Center (MDA) and a MICCAI challenge. A total of 249 scans were withheld for testing of which 30, MICCAI challenge data, were reserved for external validation. The remaining scans were divided into mixed-curation and highly-curated groups, randomly sampled into sub-datasets of various sizes, and used to train 3D nnU-Net segmentation models. Dice similarity coefficients (DSC), surface DSC with 2mm margins (SD 2mm), the 95th percentile of Hausdorff distance (HD95), and 2D axial slice DSC (Slice DSC) were used to evaluate model performance. The highly curated, 244-scan model (DSC=0.971, SD 2mm=0.958, HD95=2.98mm) performed insignificantly different on 3D evaluation metrics to the mixed-curation 2,840-scan model (DSC=0.971 [p>.999], SD 2mm=0.958 [p>.999], HD95=2.87mm [p>.999]). The 710-scan mixed-curation (Slice DSC=0.929) significantly outperformed the highly curated, 244-scan model (Slice DSC=0.923 [p=0.012]) on the 30 external scans. Highly curated datasets yielded equivalent performance to datasets that were a full order of magnitude larger. The benefits of larger, mixed-curation datasets are evidenced in model generalizability metrics and local improvements. In conclusion, tradeoffs between dataset quality and quantity for model training are nuanced and goal dependent.

5
Cross-Scanner Reliability of Brain MRI Foundation Model Embeddings: A Travelling-Heads Study

Navarro-Gonzalez, R.; Aja-Fernandez, S.; Planchuelo-Gomez, A.; de Luis-Garcia, R.

2026-03-25 radiology and imaging 10.64898/2026.03.23.26348808 medRxiv
Top 0.1%
3.7%
Show abstract

Foundation models (FMs) for brain magnetic resonance imaging (MRI) are increasingly adopted as pretrained backbones for clinical tasks such as brain age prediction, disease classification, and anomaly detection. However, if FM embeddings (internal representations) shift systematically across MRI scanners, downstream analyses built on them may reflect acquisition hardware rather than biology. No study has yet quantified this cross-scanner reproducibility. Here, we assess the cross-scanner reliability of brain MRI FM embeddings and investigate which design factors (pretraining strategy, network architecture, embedding dimensionality, and pretraining dataset scale) best explain the observed differences. Using the ON-Harmony travelling-heads dataset (20 participants, eight scanners, three vendors), we evaluate the embeddings of five architecturally diverse FMs and a FreeSurfer morphometric baseline via within- and between-scanner intraclass correlation coefficient (ICC), variance decomposition, and scanner fingerprinting. Reliability spanned the full spectrum: biology-guided models achieved good-to-excellent cross-scanner ICC (AnatCL: 0.970 [95\% confidence interval (CI): 0.94, 0.98]; y-Aware: 0.809 [0.63, 0.88]), matching or surpassing FreeSurfer (0.926 [0.83, 0.96]), whereas purely self-supervised models fell below the poor threshold (BrainIAC: 0.453, BrainSegFounder: 0.307, 3D-Neuro-SimCLR: 0.247), with 23--58\% of embedding variance attributable to scanner identity. The strongest correlate of cross-scanner reliability among the models evaluated was pretraining strategy: incorporating biological metadata (cortical morphometrics, age) into the contrastive objective produced scanner-robust embeddings, whereas architecture, dimensionality, and dataset scale did not predict reliability.

6
Comparing Modelling Architectures in the context of EGFR Status Classification in Non Small Cell Lung Cancer

Anderson, O.; Hung, R.; Fisher, S.; Weir, A.; Voisey, J. P.

2026-02-17 radiology and imaging 10.64898/2026.02.16.26346059 medRxiv
Top 0.1%
3.6%
Show abstract

Radiogenomics enables the non-invasive characterisation of the genomic and molecular properties of tumours, with epidermal growth factor receptor (EGFR) mutations in non-small cell lung cancer (NSCLC) being one of the most investigated applications. In this study, we evaluate radiomics, contrastive learning, and convolutional deep learning approaches to predict the EGFR mutation status from chest Computed Tomography (CT) images using the TCIA Radiogenomics dataset (n=115). Our results, using 10-fold cross validation, demonstrate the capacity of imaging models to predict mutation status from CT data in a manner consistent with existing literature. Among the evaluated methods, models integrating radiomic with clinical features achieved the best performance, with an AUC of 0.790 and AUPRC of 0.517, outperforming both contrastive learning (AUC=0.787) and convolutional architectures (AUC=0.763). Beyond methodological comparisons, we discuss the challenges related to clinical translation. Specifically, we contrast radiogenomics with conventional tissue biopsies, and identify scenarios where radiogenomics might be useful, either independently or in conjunction with other existing diagnostic technologies. Together these findings evidence the potential utility of radiogenomics EGFR models and provide direct architecture comparisons on the same dataset.

7
An Exploratory Study of ResNet and Capsule Neural Networks for Brain Tumor Detection in MRI

Mensah, S.; Atsu, E. K. A.; Ammah, P. N. T.

2026-02-09 radiology and imaging 10.64898/2026.02.05.26345460 medRxiv
Top 0.1%
3.5%
Show abstract

Brain tumors are one of the most life-threatening diseases, requiring precise and timely detection for effective treatment. Traditional methods for brain tumor detection rely heavily on manual analysis of MRI scans, which is time-consuming, subjective, and prone to human error. With advancements in deep learning, Convolutional Neural Networks (CNNs) have become popular for medical image analysis. However, CNNs are limited in their ability to capture spatial hierarchies and pose variations, which reduces their accuracy, particularly for tasks like brain tumor segmentation where precise spatial relationships are crucial. This research introduces a hybrid Capsule Neural Network (CapsNet) and ResNet50 model designed to overcome the limitations of traditional CNNs by capturing both spatial and pose information in MRI scans. The proposed model leverages ResNet50 for feature extraction and CapsNet for handling spatial relationships, leading to more accurate segmentation. The study evaluates the model on the BraTS2020 dataset and compares its performance to state-of-the-art CNN architectures, including U-Net and pure CNN models. The hybrid model, featuring a custom 5-cycle dynamic routing algorithm to enhance capsule agreement for tumor boundaries, achieved 98% accuracy and an F1-score of 0.87, demonstrating superior performance in detecting and segmenting brain tumors. This study pioneers the systematic evaluation of the ResNet50 + CapsNet hybrid on the BraTS2020 dataset, with a tailored class weighting scheme addressing class imbalance, improving effectiveness in identifying irregularly shaped tumors and smaller regions in identifying irregularly shaped tumors and smaller tumor regions. The study offers a robust solution for automating brain tumor detection. Future work will explore the use of Capsule Networks alone for brain tumor detection in MRI data and investigate alternative Capsule Network architectures, as well as their integration into clinical decision support systems.

8
AI-Based Pipeline for the Segmentation of White Matter Hypoattenuations in CT Scans: A Design-Choice Validation

Alamoudi, N.; Valdes Hernandez, M. d. C.; Seth, S.; Jin, B.; Sakka, E.; Arteaga-Reyes, C.; Mair, G.; Jaime-Garcia, D.; Cheng, Y.; Jochems, A. C. C.; Wardlaw, J. M.; Bernabeu Llinares, M. O.

2026-03-11 neurology 10.64898/2026.03.10.26348006 medRxiv
Top 0.1%
3.3%
Show abstract

PurposeWhite matter hyperintensities are a key imaging marker of vascular pathology, defined on brain magnetic resonance imaging (MRI) and typically manifesting on non-contrast computed tomography (CT) as subtle white matter hypoattenuation (WMH). Accurately segmenting WMH in CT scans remains challenging due to their low contrast with the surrounding tissue. This work presents an end-to-end framework for WMH segmentation in CT scans and validates the design choices in each step of the processing pipeline. We leverage a state-of-the-art deep-learning method combined with manually annotated and pseudo-labelled datasets from paired CT-MRI scans from different clinical scanners to deliver reliable outcomes. ApproachOur framework includes DICOM data curation, sequence selection, and automatic label generation as preparation steps. Preprocessing includes z-score intensity normalisation, skull stripping, CT windowing and two-step CT-MRI registration to accurately transfer MRI-derived labels into the CT space. Further processing involves the use of a 3D nnU-Net initially trained on CT images with aligned MRI-based WMH manually derived (n=91) and fine-tuned with two additional pseudolabelled datasets (n=191). FindingsCT-based WMH volumes showed a near-perfect correlation with ground-truth MRI WMH volumes (r = 0.98), with a systematic overestimation (mean difference = 2.40 mL; 95% limits of agreement: -8.31 to 13.11 mL) that may be adjustable in downstream tasks. This overestimation reflected challenges in the precise delineation of small WMH lesions and confounding from other imaging markers of brain disease. Across the evaluated cohort, ground-truth WMH volumes ranged from 1.02 to 149.34 mL. The best-performing configuration achieved a mean absolute error below 3 mL, corresponding to approximately 17% of the mean WMH volume, and a mean Dice similarity coefficient of 0.57. Segmentation accuracy decreased in the presence of stroke lesions. Models trained on single-pathology datasets, as well as approaches relying on template-based spatial normalisation, did not achieve satisfactory performance despite using the same backbone network configuration. ConclusionUsing a multi-centre dataset and a multi-modal approach with expert-annotated data combined with pseudo-labelled data for training can substantially narrow the performance gap between CT- and MRI-based WMH segmentation. The framework proposed provides a generalisable solution that underscores the practical viability of CT for evaluating WMH burden in clinical and research scenarios--particularly where MRI is unavailable or contraindicated--thereby broadening access to small-vessel disease assessment.

9
Deep Neural Patchworks Predict Renal Imaging Biomarkers from Non-Contrast MRI via Knowledge Transfer from Arterial-Phase Contrast-Enhanced MRI

Kästingschäfer, K. F.; Fink, A.; Rau, S.; Reisert, M.; Kellner, E.; Nolde, J. M.; Kottgen, A.; Sekula, P.; Bamberg, F.; Russe, M. F.

2026-02-26 radiology and imaging 10.64898/2026.02.24.26346961 medRxiv
Top 0.1%
3.1%
Show abstract

Rationale and ObjectivesContrast-enhanced (CE) MRI provides clear corticomedullary contrast for renal compartment delineation but may be contraindicated or undesirable in routine practice. We aimed to enable automated extraction of renal imaging biomarkers from routine non-contrast-enhanced (NCE) T1-weighted MRI by transferring CE-derived compartment labels. Materials and MethodsThis retrospective single-center study (January 2017 to December 2021) included 200 participants with paired arterial-phase CE and NCE T1-weighted MRI. Cortex, medulla, and sinus were manually segmented on CE MRI and rigidly transferred to NCE MRI to provide voxel-level reference labels. A hierarchical 3D Deep Neural Patchworks model was trained on 100 examinations (90 training/10 validation) and evaluated on an independent test set of 100 examinations using the transferred CE masks on NCE as reference. Performance was assessed using Dice similarity of segmentations and biomarker agreement using volumes and surface areas (Pearson/Spearman, MAE, Lins CCC, and Bland-Altman). ResultsWhole-kidney segmentation Dice was 0.950 (left) and 0.953 (right). Total kidney volume showed high agreement with minimal bias (MAE 8.76 mL, 2.5% of mean; CCC 0.983; bias -1.56 mL; 95% limits of agreement -28.81 to 25.69 mL). Cortex volume was modestly overestimated and medulla volume underestimated, shifting predicted compartment fractions toward cortex (74.7% vs. 72,1% in ground truth; medulla 21.5% vs. 24.3%; sinus 3.8% vs. 3.6%. Sinus volume maintained high concordance despite higher Dice dispersion. Surface area was systematically underestimated with low concordance. ConclusionCE-supervised knowledge transfer enables accurate, well-calibrated kidney volumetry from routine NCE MRI and supports contrast-free renal biomarker extraction. Surface area estimation remains challenging. Take-home MessagesO_LICE-supervised label transfer enables accurate, well-calibrated contrast-free kidney volumetry on routine non-contrast T1-weighted MRI. C_LIO_LICompartment volumetry is feasible but shows systematic cortex overestimation and medulla underestimation; surface area remains non-interchangeable due to boundary uncertainty. C_LI

10
On the assessment of deep-learning based super-resolution in small datasets of human brain MRI scans

Loeffen, D. W. M.; Rijpma, A.; Bartels, R. H. M. A.; Vinke, R. S.

2026-02-17 radiology and imaging 10.64898/2026.02.16.26346392 medRxiv
Top 0.1%
3.1%
Show abstract

Deep-learning based super-resolution has shown promise for enhancing the spatial resolution of brain magnetic resonance images, which may help visualize small anatomical structures more clearly. However, when only limited training data are available, it remains uncertain which model assessment method provides the most reliable estimate of out-of-sample performance. In this study, three widely used assessment strategies (three-way holdout, k-fold cross-validation, and nested cross-validation) were compared for evaluating the performance of such models in small datasets. Across 30 iterations, we randomly selected subsets of 20 T2-weighted images from the 1,113 scans of the Human Connectome Project. Each subset was used to train a model and estimate performance using the three methods. The ground truth error was computed from the remaining images. The assessment error is the difference between the estimated error and the ground truth error. The median assessment errors were 0.11,- 0.13, - 0.32 for three-way holdout, k-fold cross-validation, and nested cross-validation, respectively, with the cross-validation methods showing considerably smaller dispersions. Nested cross-validation selected fewer epochs, indicating more conservative model selection, but required substantially greater computational time, over three times longer than three-way holdout and more than twenty times longer than k-fold cross-validation. Our findings suggest that k-fold cross-validation offers the most favourable balance between accuracy, stability, and computational feasibility in small datasets. Further research is needed to determine how model complexity, dataset size, and the number of cross-validation folds influence assessment accuracy.

11
The NLP-to-Expert Gap in Chest X-ray AI

Fisher, G. R.

2026-03-02 radiology and imaging 10.64898/2026.02.27.26347261 medRxiv
Top 0.1%
2.7%
Show abstract

In previous work, we achieved state-of-the-art performance on ChestX-ray14 (ROC-AUC 0.940, F1 0.821) using pretraining diversity and clinical metric optimization. Applying the same methodology to CheXpert, we received similar results when using NLP valuation and test data--but when evaluated against expert radiologist labels, performance was only 0.75-0.87 ROC-AUC. The models had learned to match the automated NLP labeling system, not to diagnose disease. This paper documents our investigation into this failure and our suggested resolution. We identify the NLP-to-Expert generalization gap: a systematic divergence between models optimized on labels extracted from radiology reports and their agreement with board-certified radiologists. More surprisingly, we discovered that directly optimizing for small expert-labeled validation sets can be counterproductive-- models with lower validation scores often generalized better to held-out expert test data. Four findings emerged: First, expert-labeled images for at least the validation and testing datasets, even if not for training, were vital in revealing the gap between NLP agreement and diagnostic accuracy. Without them, our models appeared excellent while failing to generalize to clinical judgment. Second, less training is better. Short training (1-5 epochs) outperformed extended training (60+ epochs) because longer training doesnt improve the model--it memorizes the labelers mistakes. Third, ImageNet features are sufficient. Freezing the pretrained backbone and training only the classifier achieved 0.891 ROC-AUC--matching models with full fine-tuning. The rapid convergence we observed wasnt the model learning chest X-ray features; it was the classifier calibrating to already-sufficient visual representations. Fourth, regularization beats optimization. Label smoothing and frozen backbones--methods that prevent overfitting--outperformed direct metric optimization on small validation sets. The 200 expert-labeled validation images in CheXpert are too few to optimize directly; they are better used as a compass than a target. With these insights, we improved from 0.823 to 0.917 ROC-AUC, exceeding Stanfords official baseline (0.907).

12
Normal is All You Need: A Symmetry-Informed Inverse Learning Foundation Model for Neuroimaging Diagnostics

Wang, S.; Ayubcha, C.; Hua, Y.; Beam, A.

2026-04-12 radiology and imaging 10.64898/2026.04.10.26350553 medRxiv
Top 0.1%
2.7%
Show abstract

Background: Developing generalizable neuroimaging models is often hindered by limited labeled data which has led to an increased interest in unsupervised inverse learning. Existing approaches often neglect geometric principles and struggle with diverse pathologies. We propose a symmetry-informed inverse learning foundation model to address these shortcomings for robust and efficient anomaly detection in brain MRI. Methods: Our framework employs a reconstruction-to-embedding pipeline, trained exclusively on healthy brain MRI slices. A 2D U-Net uses a novel, symmetry-aware masking strategy to reconstruct a disorder-free slice. Difference maps are embedded into a 1024-dimensional latent space via a Beta-VAE. Anomaly scoring is performed using Mahalanobis distance. We evaluated generalization by fine-tuning on external lesion datasets, BraTS Africa (SSA), and the ADNI-derived Alzheimer disease cohort (Alz). Results: On the source metastasis (Mets) dataset, the framework achieved high performance (AB1+MSE: 99.28% accuracy, 99.79% sensitivity). Generalization to the external lesion dataset (SSA) was robust, with the Symmetry ROC configuration achieving 91.93% accuracy. Transfer to the Alzheimer dataset (Alz) was more challenging, achieving a peak accuracy of 70.54% with a high false-positive rate, suggesting difficulty in separating subtle, diffuse changes. Conclusion: The symmetry-informed inverse learning framework establishes a robust foundation model for neuroimaging, showing strong performance for focal lesions and successful generalization under domain shift. Limitations in diffuse neurodegeneration underscore the necessity for richer representations and multimodal integration to improve future foundation models.

13
Impact of Image Bit Depth Reduction on Deep Learning Performance in Chest Radiograph Analysis: A Multi-institutional Study

Takita, H.; Mitsuyama, Y.; Walston, S. L.; Saito, K.; Sugibayashi, T.; Okamoto, M.; Suh, C. H.; Ueda, D.

2026-03-09 radiology and imaging 10.64898/2026.03.07.26347853 medRxiv
Top 0.1%
2.3%
Show abstract

PurposeMedical imaging typically generates 12- to 16-bit formats, yet conversion to 8-bit is often required. While deep learning has been widely explored in medical imaging, the influence of image bit depth on model performance is not fully understood. This study evaluates the impact of conversion from 16-bit to 8-bit for sex, age, and obesity classification using deep learning. Materials and methodsIn this retrospective, multi-institutional study, we analyzed 100,002 chest radiographs from 48,047 participants across three institutions. Three convolutional neural network architectures (ResNet52, EfficientNetB2, and ConvNeXtSmall) were trained on both 16-bit and 8-bit versions of the images. Model performance was evaluated using internal test datasets, randomly split multiple times, and an external test dataset. Statistical analysis included paired comparisons of area under the receiver operating characteristic curve (AUC-ROC) values, with Bonferroni correction for multiple comparisons. ResultsAcross all architectures and classification tasks, differences between 16-bit and 8-bit model performance were minimal (mean differences ranging from -0.218% to 0.184%). Statistical analyses revealed no significant differences in AUC-ROC values between bit depths for any model-task combination (all p-values > 0.05 after Bonferroni correction). Effect sizes were small to moderate (Cohens d ranging from -0.415 to 0.391). ConclusionReducing image bit depth from 16-bit to 8-bit does not significantly impact the performance of deep learning models in chest radiograph analysis. These findings suggest that 8-bit images can be used for deep learning applications in medical imaging without compromising model performance, potentially allowing for more efficient data storage and processing.

14
Transient Magnetic Resonance Elastography: a method to measure the mechanics of the active heart

Barbero-Mota, M.; Annio, G.; Rucher, G.; Martorell, J.

2026-03-04 bioengineering 10.64898/2026.03.02.708995 medRxiv
Top 0.1%
2.0%
Show abstract

Myocaridum biomechanics are a biomarker for multiple cardiac pathologies. However the rapid and complex heart motion hampers accurate measurements of the tissue stiffness. Current in vivo methods for the evaluation of myocardium mechanical health are either highly invasive or can only provide with a global surrogate of heart function as they suffer from poor spatiotemporal resolution. We propose a new in vivo technique, transient magnetic resonance elastography (tMRE), to assess the dynamic cardiac biomechanics. tMRE is able to quantify local shear wave speed as a proxy for myocardial stiffness at user-defined times within the cardiac cycle. We report proof-of-concept results where we probe the septum of 4 different healthy rat specimens at 3 physiologically distinct cardiac phases. We provide with apparent speed measurements for early systole, mid-late systole and early diastole that match the expected values from the cardiac cycle physiological mechanics. We correct for non-negligible geometrical biases using literature results and report true stiffness values where possible. Finally, we validate tMRE in phantom experiments.

15
DIA-PINN. A physics-informed machine learning method to estimate global intrinsic diastolic chamber properties of the left ventricle from pressure-volume data

Fernandez Topham, J.; Guerrero Hurtado, M.; del Alamo, J. C.; Bermejo, J.; Martinez Legazpi, P.

2026-03-06 cardiovascular medicine 10.64898/2026.03.02.26347245 medRxiv
Top 0.1%
1.8%
Show abstract

BackgroundPressure-volume (PV) loop analysis remains the gold standard for assessing the intrinsic global diastolic properties of the left ventricle (LV). Traditional fitting techniques rely on local, phase-constrained fittings and are limited due to their sensitivity to noise, landmark selection, violation of assumptions, and non-convergence. ObjectiveTo develop and validate DIA-PINN, a physics-informed neural network (PINN) framework capable of calculating intrinsic diastolic properties of the LV from measured instantaneous PV data, combining mechanistic interpretability with machine learning flexibility. MethodsInstantaneous LV diastolic pressure was modeled as the sum of 1) time-dependent relaxation-related pressure and 2) volume-dependent recoil and stiffness-related pressures. DIA-PINN was trained using time, LV pressure and volume as inputs, enforcing data fidelity, model consistency, and physiological plausibility within the loss function. Performance was evaluated in 4,000 Monte Carlo simulations of LV PV-loops, and in clinical data from 59 patients who underwent catheterization (39 with heart failure and normal ejection fraction and 20 controls). DIA-PINN derived indices were compared to those obtained from a previously validated global optimization method (GOM). ResultsOn the simulation data, DIA-PINN accurately recovered all constitutive indices (intraclass correlation coefficients near unity) and improved GOM performance. On the clinical data, diastolic indices derived using DIA-PINN strongly correlated with GOM estimates (R>0.90, p<0.001) but were insensitive to initialization. DIA-PINN performed best under vena cava occlusion, as varying preload improved parameter identifiability. ConclusionsWhen applied to instantaneous pressure-volume data, a generalizable PINN framework, DIA-PINN, provides an improved method for assessing global intrinsic diastolic properties of cardiac chambers. New & NoteworthyOur work introduces DIA-PINN, a physics-informed neural network framework to process instantaneous ventricular pressure-volume data, solving a mechanistic model of diastole with machine learning techniques. Compared to current conventional or optimization-based approaches, the PINN provides the most reliable estimates of diastolic stiffness, relaxation, and elastic recoil, unsensitive to initialization. By embedding physiological constraints into network training, this approach achieves robust, interpretable, and clinically applicable quantification of gold-standard metrics of intrinsic global diastolic chamber properties.

16
Brain-SAM: A SAM-based Model Tailored for Brain MRI Lesion Segmentation

Pan, Y.; Yuan, X.; Liu, H.; Yang, Y.; Kang, G.

2026-02-03 radiology and imaging 10.64898/2026.01.30.26345164 medRxiv
Top 0.1%
1.8%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMagnetic resonance imaging (MRI) is a cornerstone of modern neuroimaging, where accurate segmentation of brain structures and lesions is essential for diagnosis, treatment planning, and clinical research. However, most current foundation models are trained on mixed-organ datasets, while the anatomical structures of the brain differ substantially from those of other organs such as the lungs and kidneys. As a result, these models often struggle to adapt to the distinctive characteristics of brain tissue. In this work, we present Brain-SAM, a model tailored for brain MRI segmentation. Brain-SAM extends the Segment Anything Model 2 (SAM2) framework by enabling the Hiera encoder to directly process 3D volumetric data and introducing a UNETR-inspired decoder for hierarchical feature decoding. The model preserves the interactive segmentation paradigm of SAM while also supporting fully automatic segmentation. Trained on multiple brain MRI datasets covering brain tumors, stroke, and epilepsy, Brain-SAM demonstrated superior performance to state-of-the-art methods. Compared with nnU-Net, it achieved Dice scores improvements of 22%, 9%, and 6% on epileptic lesions, brain metastases, and meningiomas, respectively. Notably, Brain-SAM showed clear advantages in small-lesion segmentation, achieving 15%-18% higher Dice compared with other strong baseline models. We believe that Brain-SAM may offer a useful pre-trained model for downstream brain MRI analysis tasks, and could contribute to future research and clinical applications.Our code and models are available at https://github.com/DLbrainsam/Brain-SAM.

17
Foundation Model Robustness to Technical Acquisition Parameters in Chest X-Ray AI A Multi-Architecture Comparative Study with External Validation

Farquhar, H.

2026-01-27 radiology and imaging 10.64898/2026.01.25.26344809 medRxiv
Top 0.1%
1.8%
Show abstract

BackgroundFoundation models have emerged as a promising paradigm for medical imaging AI [7], with claims of improved generalization and reduced bias. However, their robustness to technical acquisition parameters remains unexplored. We evaluated whether foundation models exhibit greater robustness to chest radiograph view type (anteroposterior [AP] versus posteroanterior [PA]) compared to traditional convolutional neural networks. MethodsWe compared four model architectures on the RSNA Pneumonia Detection Challenge dataset (n=26,684 images) and externally validated on the NIH ChestX-ray14 dataset (n=112,120 images): DenseNet-121 (supervised CNN), BiomedCLIP (vision-language model trained on 15 million biomedical image-text pairs), RAD-DINO (self-supervised model trained on 5+ million radiographs), and CheXzero (vision-language model trained on MIMIC-CXR chest radiographs). Primary outcome was the sensitivity gap between AP and PA views, with bootstrap confidence intervals and permutation testing. ResultsOn RSNA, CheXzero showed the smallest gap (14.3%, 95% CI: 11.2-17.5%), followed by RAD-DINO (25.2%, 22.6-27.9%), DenseNet-121 (35.7%, 32.9-38.7%), and BiomedCLIP (36.1%, 33.5-39.0%). However, on external validation (NIH), model rankings reversed completely: RAD-DINO demonstrated the smallest gap (22.3%, 95% CI: 21.0-23.6%), while CheXzeros gap increased dramatically to 48.9% (95% CI: 47.7-50.1%). Domain-specific training provided robustness within the training domain but failed to generalize. On PA view pneumonia cases in NIH, 31% were missed by all four models, representing a systematic blind spot. View type explained 61-100% of performance variance across models on both datasets, compared to 0-38% for age and less than 4% for sex. ConclusionsFoundation models do not eliminate technical acquisition parameter biases in chest X-ray AI. While domain-specific training (CheXzero) provided superior robustness on internal validation, this advantage collapsed on external data. Self-supervised learning (RAD-DINO) demonstrated the most generalizable robustness, with consistent view type gap stability across datasets with different labeling schemes (25.2% [-&gt;] 22.3%, despite substantial AUC differences). These findings challenge assumptions about foundation model generalization and highlight the need for acquisition parameter auditing in AI regulatory frameworks and multi-site external validation for robustness claims.

18
Evaluating Spiking and Non-Spiking Neural Networks for Colorectal Serrated Polyp Subtype Classification

Littlefield, N.; Bao, R.; Xia, R.; Gu, Q.

2026-01-27 pathology 10.64898/2026.01.24.26344766 medRxiv
Top 0.1%
1.8%
Show abstract

Image classification on digital pathology images relies heavily on convolutional neural networks (CNNs), yet the behavior of alternative neural computing paragigms in this domain remains insufficiently characterized. Spiking neural networks (SNNs), which process information through event-driven spike-based dynamics, have recently become trainable at scale but have not been evaluated under standardized colorectal pathology benchmarks. This study presents the first controlled comparison of SNNs and CNNs on the Minimalist Histopathology Image Analysis (MHIST) Dataset, a widely used publicly available benchmark designed for reproducible evaluation of histopathology classification models released by Dartmouth-Hitchcock Medical Center. The classification task focuses on the clinically important binary distinction between hyperplastic polyps (HPs) and sessile serrated adenomas (SSAs), a challenging problem characterized by substantial inter-pathologist variability, where HPs are typically benign and SSAs represent precancerous lesions requiring closer clinical follow-up. Histologically, HPs exhibit superficial serrated architecture and elongated crypts, whereas SSAs are characterized by broad-based, often complex crypt structures with pronounced serration. A conventional ResNet-18 architecture and its spiking counterpart are evaluated under matched training and inference to isolate the effect of spiking computation. Models performance is quantified using the area under the receiver operating characteristic curve (ROC-AUC), yielding 0.817 for the conventional CNN and 0.812 for the SNN. This comparison enables a direct assessment of how spiking computation influences discriminative performance in HPs versus SSAs binary classification and provides a benchmark reference for SNNs on the MHIST dataset. The code is publicly available at https://github.com/qug125/snn-crcp.

19
Artificial Intelligence Devices for Image Analysis in Digital Pathology

Matthews, G. A.; Godson, L.; McGenity, C.; Bansal, D.; Treanor, D.

2026-03-26 pathology 10.64898/2026.03.23.26349089 medRxiv
Top 0.1%
1.7%
Show abstract

BO_SCPLOWACKGROUNDC_SCPLOWThere is increasing momentum behind the clinical implementation of AI-based software for image analysis in digital pathology. As regulations, standards, and national approaches to the clinical use of AI continue to develop, the marketplace of AI products is expanding and evolving - presenting pathologists with a multitude of devices that offer the potential to improve pathology services. MO_SCPLOWETHODSC_SCPLOWTo maintain pace with this changing AI device landscape, we conducted a comprehensive search for, and analysis of, commercial AI products for image analysis in digital pathology. This included CE-marked and Research Use Only (RUO) products using images with histological stains (e.g., H&E) or immunohistochemical (IHC) labelling. Product information and published clinical validation studies were assessed, to understand the quality of supporting evidence on available products, and product details were compiled into a public register: https://osf.io/gb84r/overview. RO_SCPLOWESULTSC_SCPLOWIn total, we identified and assessed 90 CE-marked and 227 RUO AI products. We found that AI products for cancer detection in prostate and breast pathology comprised a substantial portion of the marketplace for H&E image analysis, while IHC products were almost exclusively for use in breast cancer. Clinical validation studies on these products have steadily increased; however, we found that published studies were only available for just over half of H&E products and just over a quarter of IHC products. For CE-marked products, the dataset quality and diversity for AI model performance validation was highly variable, and particularly limited for IHC products. Furthermore, only a limited number of products included studies that assessed measures of clinical utility. CO_SCPLOWONCLUSIONC_SCPLOWAs clinical deployment of AI products for image analysis in histopathology grows, there is a need for transparency, rigorous validation, and clear evidence supporting clinical utility and cost-effectiveness. Independent scrutiny of the expanding offering of AI products provides insight into the opportunities and shortcomings in this domain.

20
Hierarchical Barycentric Multimodal Representation Learning for Medical Image Analysis

Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.

2026-04-06 neurology 10.64898/2026.04.05.26350202 medRxiv
Top 0.1%
1.7%
Show abstract

Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.